Automatic Learning of Context-Free Grammar
نویسندگان
چکیده
In this paper we study the problem of learning context-free grammar from a corpus. We investigate a technique that is based on the notion of minimum description length of the corpus. A cost as a function of grammar is defined as the sum of the number of bits required for the representation of a grammar and the number of bits required for the derivation of the corpus using that grammar. On the Academia Sinica Balanced Corpus with part-of-speech tags, the overall cost, or description length, reduces by as much as 14% compared to the initial cost. In addition to the presentation of the experimental results, we also include a novel analysis on the costs of two special context-free grammars, where one derives only the set of strings in the corpus and the other derives the set of arbitrary strings from the alphabet.
منابع مشابه
Automatic Melodic Reduction Using a Supervised Probabilistic Context-Free Grammar
This research explores a Natural Language Processing technique utilized for the automatic reduction of melodies: the Probabilistic Context-Free Grammar (PCFG). Automatic melodic reduction was previously explored by means of a probabilistic grammar [11] [1]. However, each of these methods used unsupervised learning to estimate the probabilities for the grammar rules, and thus a corpusbased evalu...
متن کاملAutomatic Grammar Acquisition
We describe a series of three experiments in which supervised learning techniques were used to acquire three different types of grammars for English news stories. The acquired grammar types were: 1) context-free, 2) context-dependent, and 3) probabilistic context-free. Training data were derived from University of Pennsylvania Treebank parses of 50 Wall Street Journal articles. In each case, th...
متن کاملNatural Language Grammar Induction Using a Constituent-Context Model
This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This me...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کامل\sigma GTTM III: Learning-Based Time-Span Tree Generator Based on PCFG
We propose an automatic analyzer for acquiring a time-span tree based on the generative theory of tonal music (GTTM). Although analyzer based on GTTM was previously proposed, it requires manually tweaking the 46 adjustable parameters on a computer screen in order to analyze them properly. We reformalized the time-span reduction in GTTM based on a statistical model called probabilistic context-f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006